spike stream
Supplementary Material for Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams Shiyan Chen
All RSTB blocks consist of 6 STB blocks. Each sequence contains 33 frames. Blurry images with different motion magnitudes are generated by averaging the surrounding 33 or 65 images. S1, we observe that the introduction of CAMMA also improves the performance of de-blurring across all settings. We have added comparisons regarding computational complexity and inference time in Tab.
- Asia > Middle East > Israel (0.05)
- Asia > China (0.04)
Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams
Traditional cameras produce desirable vision results but struggle with motion blur in high-speed scenes due to long exposure windows. Existing frame-based deblurring algorithms face challenges in extracting useful motion cues from severely blurred images. Recently, an emerging bio-inspired vision sensor known as the spike camera has achieved an extremely high frame rate while preserving rich spatial details, owing to its novel sampling mechanism. However, typical binary spike streams are relatively low-resolution, degraded image signals devoid of color information, making them unfriendly to human vision. In this paper, we propose a novel approach that integrates the two modalities from two branches, leveraging spike streams as auxiliary visual cues for guiding deblurring in high-speed motion scenes. We propose the first spike-based motion deblurring model with bidirectional information complementarity. We introduce a content-aware motion magnitude attention module that utilizes learnable mask to extract relevant information from blurry images effectively, and we incorporate a transposed cross-attention fusion module to efficiently combine features from both spike data and blurry RGB images.Furthermore, we build two extensive synthesized datasets for training and validation purposes, encompassing high-temporal-resolution spikes, blurry images, and corresponding sharp images. The experimental results demonstrate that our method effectively recovers clear RGB images from highly blurry scenes and outperforms state-of-the-art deblurring algorithms in multiple settings.
Unsupervised Optical Flow Estimation with Dynamic Timing Representation for Spike Camera
Efficiently selecting an appropriate spike stream data length to extract precise information is the key to the spike vision tasks. To address this issue, we propose a dynamic timing representation for spike streams. Based on multi-layers architecture, it applies dilated convolutions on temporal dimension to extract features on multi-temporal scales with few parameters.
Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras
The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity.
- North America > Mexico > Gulf of Mexico (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Yunnan Province > Kunming (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams
Gao, Zhuoheng, Zhang, Jiyao, Xie, Zhiyong, Dong, Hao, Yu, Zhaofei, Chen, Rongmei, Chen, Guozhang, Huang, Tiejun
Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects. The ability to pick up an arbitrary object is a fundamental measure of intelligence for an autonomous robot. The prevailing approach to this grasp detection problem follows a distinct geometry-first pipeline: capture a scene with sensors, reconstruct a 3D geometric model (typically a point cloud) and then analyze this model for a viable grasp (Fang et al., 2020; Gui et al., 2025). This paradigm is logical from a computer graphics perspective, but is a significant departure from how biological systems operate. The brain does not compute or store explicit point clouds to decide how to grasp a coffee cup (Cao et al., 2025); it leverages a continuous stream of sensory information processed through a highly efficient neural architecture.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)